Skip to content

feat(ai): add $ai_completion_id and $ai_provider_metadata to OpenAI events#3306

Merged
carlos-marchal-ph merged 4 commits into
PostHog:mainfrom
johnsykim:feat/ai-completion-id
May 29, 2026
Merged

feat(ai): add $ai_completion_id and $ai_provider_metadata to OpenAI events#3306
carlos-marchal-ph merged 4 commits into
PostHog:mainfrom
johnsykim:feat/ai-completion-id

Conversation

@johnsykim

@johnsykim johnsykim commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds two new auto-captured properties to $ai_generation events for the OpenAI and Azure OpenAI wrappers, enabling correlation between PostHog events and OpenAI's Logs dashboard (platform.openai.com/logs/{completion_id}):

  • $ai_completion_id — provider-assigned response ID (chatcmpl-…, resp_…). Lives at the top of the $ai_* namespace because response IDs generalize across providers.
  • $ai_provider_metadata — OpenAI-specific fields (system_fingerprint, request_id) collected under a single blob rather than polluting the shared, provider-agnostic schema. Sets the pattern for Anthropic / Gemini wrappers to follow.

Both options are now accepted by the public captureAiGeneration primitive, so external instrumentation produces identical events.

Revision notes (rebased against the new captureAiGeneration primitive)

This branch was originally written against sendEventToPosthog. Since #3499 landed, every wrapper funnels through captureAiGeneration, so the implementation was rewritten cleanly on top of the new primitive.

Addresses Richard's review:

  1. Move OpenAI-specific fields under $ai_provider_metadata. system_fingerprint and request_id are OpenAI-only concepts and now live under $ai_provider_metadata instead of at the top level.
  2. Keep $ai_completion_id at the top level. Response IDs generalize across providers, so this stays in the shared schema.
  3. Fix the TS2353 build error. mockOpenAiChatResponse is now typed as ChatCompletion & { _request_id?: string }. pnpm build passes.
  4. Consolidate the duplicated (result as any)._request_id cast. Replaced all six call sites with a single extractRequestId(result) helper in packages/ai/src/openai/utils.ts, with an explanatory doc comment in one place.
  5. ⏭️ Streaming request_id is deferred to a follow-up as suggested — it requires threading .withResponse() through the streaming flow, which is a larger refactor than this PR. Streaming chats still capture $ai_completion_id and system_fingerprint from accumulated chunks.

Also addresses Greptile's outstanding comment about Responses API request_id coverage: the responses.parse and responses.create web-search tests both now exercise the positive case.

What it captures, by path

Path $ai_completion_id $ai_provider_metadata
Chat Completions — non-streaming result.id system_fingerprint, request_id
Chat Completions — streaming accumulated chunk.id system_fingerprint (from chunks)
Responses API — non-streaming (create) result.id request_id
Responses API — streaming accumulated chunk.response.id
Responses API — parse result.id request_id

Same matrix for the Azure OpenAI wrapper.

Test plan

  • pnpm lint clean
  • pnpm build clean (TS2353 fixed)
  • New unit tests for the extractRequestId / buildProviderMetadata helpers (packages/ai/tests/openai-utils.test.ts) — 11 cases including the "empty → undefined" branch
  • OpenAI basic completion asserts $ai_completion_id + $ai_provider_metadata with both fields
  • OpenAI basic streaming completion asserts $ai_completion_id + $ai_provider_metadata with system_fingerprint only
  • OpenAI responses parse asserts $ai_completion_id + $ai_provider_metadata.request_id
  • OpenAI responses.create (web-search test) asserts $ai_completion_id + $ai_provider_metadata.request_id
  • Azure OpenAI basic completion mirrors the OpenAI assertions
Original background & motivation

Why this matters

The @posthog/ai package wraps the OpenAI SDK and auto-emits $ai_generation events. Without response.id, there's no way to navigate from a PostHog $ai_generation event to the corresponding entry in OpenAI's Logs dashboard (platform.openai.com/logs/{completion_id}).

  • Debugging. When investigating a cost spike or error, engineers need to jump from PostHog analytics → OpenAI's detailed log (full prompt/response, token breakdown, latency).
  • Correlation. Without response.id, joining PostHog events with OpenAI's own logging system requires multi-hop lookups via entity IDs.
  • Support. OpenAI support tickets reference x-request-id — having this on the PostHog event makes it trivial to file tickets with the right correlation ID.

References

  • OpenAI Logs dashboard: https://platform.openai.com/logs?api=chat-completions
  • Completion ID formats: chatcmpl-{base62} (Chat Completions), resp_{base62} (Responses)
  • OpenAI SDK _request_id: attached from the x-request-id response header by the openai npm package

@vercel

vercel Bot commented Mar 31, 2026

Copy link
Copy Markdown

@johnsykim is attempting to deploy a commit to the PostHog Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps

greptile-apps Bot commented Mar 31, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/ai/tests/openai-utils.test.ts:29-38
These two test cases each contain multiple inline assertions that should be parameterized. The project explicitly prefers `it.each` for multi-case checks, and the `extractRequestId` tests in the same file already follow this pattern correctly.

```suggestion
  it.each([
    [{ systemFingerprint: 'fp_1' }, { system_fingerprint: 'fp_1' }],
    [{ requestId: 'req_1' }, { request_id: 'req_1' }],
  ])('omits keys whose value is missing: %p → %p', (input, expected) => {
    expect(buildProviderMetadata(input)).toEqual(expected)
  })

  it.each([
    [{}],
    [{ systemFingerprint: undefined, requestId: undefined }],
    [{ systemFingerprint: null, requestId: null }],
  ])('returns undefined when there is nothing to report: %p', (input) => {
    expect(buildProviderMetadata(input)).toBeUndefined()
  })
```

Reviews (4): Last reviewed commit: "feat(ai): add $ai_completion_id and $ai_..." | Re-trigger Greptile

Comment thread packages/ai/src/openai/index.ts Outdated
Comment thread packages/ai/src/openai/index.ts Outdated
@johnsykim johnsykim marked this pull request as draft March 31, 2026 21:36
@johnsykim

Copy link
Copy Markdown
Contributor Author

@greptile review

@johnsykim johnsykim marked this pull request as ready for review March 31, 2026 21:58
@github-actions

github-actions Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@github-actions github-actions Bot added the stale label Apr 8, 2026
@johnsykim

Copy link
Copy Markdown
Contributor Author

@haacked @danielbachhuber Sorry for random tagging (please tell me if there is a better PostHog POC to reach out to). What's the best way to get PostHog team's attention on this PR? Is there a certain review and release process I have to follow?

@danielbachhuber

Copy link
Copy Markdown
Contributor

@johnsykim Hi! I'm no longer with PostHog but, based on the history of the files, @richardsolomou, @carlos-marchal-ph, or @Radu-Raicea might be able to give you a review.

@github-actions github-actions Bot removed the stale label Apr 10, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@richardsolomou richardsolomou left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together — the motivation (correlating PostHog events to OpenAI's logs dashboard) is a real pain point and the implementation is clean. Chatted with the team about the schema shape, and we'd like to adjust the approach before merging.

Design ask

  1. Move OpenAI-specific fields under a $ai_provider_metadata blob$ai_* has been provider-agnostic so far, and system_fingerprint / request_id are OpenAI-only concepts. Rather than living at the top level, they should go under a new $ai_provider_metadata property so each provider wrapper can surface its own metadata without polluting the shared namespace. Something like:
    $ai_provider_metadata: {
      system_fingerprint: '...',
      request_id: '...',
    }
    This also sets the pattern for Anthropic / Gemini wrappers to follow later.
  2. Keep $ai_completion_id at the top level — response IDs generalize cleanly (Anthropic and Gemini both have equivalents), so this one belongs in the shared schema.

On the app side: we think rendering a link to the OpenAI log from the event inspector is valuable and we'll pick that up separately once this is merged!

Blocking

  1. Build fails with TS2353 on packages/ai/tests/openai.test.ts:292_request_id isn't a public property of ChatCompletion, so the literal is rejected under strict TS. pnpm build fails on this branch and passes on main. CI for external PRs only runs Wiz/Graphite/Vercel, which is why this slipped past. Easiest fix:
    let mockOpenAiChatResponse: ChatCompletion & { _request_id?: string } = {} as ChatCompletion

Suggestions

  1. Duplicated (result as any)._request_id pattern — appears 6 times across packages/ai/src/openai/index.ts and packages/ai/src/openai/azure.ts. A small helper (extractRequestId(result)) would consolidate the cast and the explanatory comment in one place.
  2. Streaming paths don't capture requestId — only completionId / systemFingerprint are pulled from chunks. The OpenAI SDK exposes _request_id on the aggregated stream response (via .withResponse()), so there's room to capture it for streams too. Fine as a follow-up.

Let me know if anything's unclear — happy to answer questions as you work through the restructure.

@github-actions

Copy link
Copy Markdown
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@johnsykim

Copy link
Copy Markdown
Contributor Author

Thanks a lot @richardsolomou. I'll raise a revision sometime soon.

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@github-actions github-actions Bot added the stale label May 7, 2026
@richardsolomou richardsolomou requested a review from a team May 8, 2026 07:55
@github-actions github-actions Bot removed the stale label May 8, 2026
@carlos-marchal-ph

Copy link
Copy Markdown
Contributor

Hi @johnsykim, taking this over from @richardsolomou! I agree with Richard's review. I'm going to mark this as a draft in the meantime, un-draft it yourself when it's ready for us again, and we'll get pinged to take a second look :)

@carlos-marchal-ph carlos-marchal-ph marked this pull request as draft May 8, 2026 11:32
@github-actions

Copy link
Copy Markdown
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@github-actions github-actions Bot added the stale label May 18, 2026
…vents

Adds two new auto-captured properties to `$ai_generation` events emitted by
the OpenAI and Azure OpenAI wrappers, enabling direct correlation between
PostHog events and OpenAI's Logs dashboard
(`platform.openai.com/logs/{completion_id}`):

- `$ai_completion_id` — provider-assigned response ID (e.g. `chatcmpl-…`,
  `resp_…`). Generalises across providers, so it lives at the top of the
  `$ai_*` namespace.
- `$ai_provider_metadata` — OpenAI-specific fields (`system_fingerprint`,
  `request_id`) collected under a single blob rather than polluting the
  shared schema. Sets the pattern for Anthropic / Gemini wrappers to follow.

Both options are accepted by the public `captureAiGeneration` primitive, so
external instrumentation produces identical events. Coverage spans
non-streaming and streaming Chat Completions plus Responses API
(`create` + `parse`) for both OpenAI and Azure OpenAI. Streaming paths
extract `id` / `system_fingerprint` from accumulated chunks; the
`x-request-id` header is read via the OpenAI SDK's semi-private
`_request_id` field through a small typed helper. Streaming
`request_id` capture (which needs `.withResponse()`) is left as a
follow-up — see review thread on PostHog#3306.

Addresses review feedback on PostHog#3306:
- restructure `system_fingerprint` / `request_id` under `$ai_provider_metadata`
- fix the TS2353 build error on the test mock by widening `ChatCompletion`
  with `{ _request_id?: string }`
- consolidate the duplicated `(result as any)._request_id` cast into
  `extractRequestId` with an explanatory comment in one place
- add Responses API `$ai_completion_id` / `request_id` test coverage

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@johnsykim johnsykim force-pushed the feat/ai-completion-id branch from ce4c721 to bf5b13a Compare May 19, 2026 23:01
@johnsykim johnsykim changed the title feat(ai): add $ai_completion_id, $ai_system_fingerprint, $ai_request_id to OpenAI events feat(ai): add $ai_completion_id and $ai_provider_metadata to OpenAI events May 19, 2026
@johnsykim johnsykim marked this pull request as ready for review May 19, 2026 23:02
@johnsykim

Copy link
Copy Markdown
Contributor Author

@carlos-marchal-ph @richardsolomou Revision is up, ready for review again. Brief summary of what changed:

  • Rebased onto the new captureAiGeneration primitive (sendEventToPosthog no longer exists since feat(posthog-ai): add captureAiGeneration as canonical capture primitive #3499), so the implementation is rewritten cleanly on top of that. completionId and providerMetadata are now options on CaptureAiGenerationOptions, available to anyone calling the primitive directly.
  • system_fingerprint and request_id moved under $ai_provider_metadata as Richard suggested. $ai_completion_id stays at the top level since response IDs generalize across providers.
  • TS2353 build error fixed. pnpm build and pnpm lint both pass.
  • (result as any)._request_id consolidated into extractRequestId with a doc comment in one place (packages/ai/src/openai/utils.ts).
  • Streaming request_id is left as a follow-up as you suggested — needs .withResponse() threading.
  • Test coverage extended: dedicated unit tests for the new helpers, $ai_completion_id + $ai_provider_metadata asserted on non-streaming chat, streaming chat, responses.parse, responses.create (non-streaming), and the Azure equivalent. This also closes Greptile's outstanding comment about missing Responses API request_id coverage.

Full revision notes in the updated PR description above. Thanks for hanging in there while I got around to this!

@johnsykim

Copy link
Copy Markdown
Contributor Author

@greptile review

@greptile-apps

greptile-apps Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/ai/tests/openai-utils.test.ts:29-38
**Non-parameterised test blocks bundle multiple independent cases**

The `buildProviderMetadata` tests pack multiple distinct inputs into the same `it` block. Per the repo's simplicity rule ("we always prefer parameterised tests"), each input/output pair should be its own case in an `it.each` table. As written, a failure only points to the block, not the specific input that broke.

For example, "omits keys whose value is missing" tests `{ systemFingerprint }` and `{ requestId }` separately, and "returns undefined when there is nothing to report" tests `{}`, `{ …undefined }`, and `{ …null }` — these are three distinct edge-cases that would be clearer as three rows in an `it.each`.

### Issue 2 of 2
packages/ai/src/openai/index.ts:287-305
**Error capture path in streaming skips `completionId` even when it is available**

`completionIdFromResponse` and `systemFingerprintFromResponse` are declared in the outer scope and may have been populated by one or more chunks before the exception fires. However, the `catch` block that emits the error event never passes these values. When the stream fails mid-flight (after the first chunk carrying `chunk.id`), the error event has no `$ai_completion_id`, making it impossible to correlate the error against OpenAI's Logs dashboard — the exact use-case this PR enables.

The same gap exists in the equivalent streaming catch block in `azure.ts`.

Reviews (5): Last reviewed commit: "feat(ai): add $ai_completion_id and $ai_..." | Re-trigger Greptile

@johnsykim johnsykim marked this pull request as draft May 19, 2026 23:08
- Streaming error path now surfaces accumulated completion metadata.
  When a stream fails mid-flight after consuming chunks that carried
  `chunk.id` / `chunk.system_fingerprint`, the error event was being
  emitted without `$ai_completion_id` / `$ai_provider_metadata` — the
  exact correlation IDs this PR is meant to enable. Hoisted the
  accumulators above the `try` block and pass them in the `catch`
  capture for both `index.ts` and `azure.ts`, chat and Responses.

- Convert the `openai-utils` tests to parameterised `it.each` tables
  per the repo's "always prefer parameterised tests" convention; each
  input/output pair is now its own row so a failure points to the
  specific case that broke.
@johnsykim

Copy link
Copy Markdown
Contributor Author

Addressed both Greptile points on 1547db60:

  1. Streaming error path now surfaces accumulated completion metadata. Hoisted completionIdFromResponse / systemFingerprintFromResponse above the try and pass them in the catch capture (index.ts + azure.ts, chat and Responses). Mid-flight stream failures now carry whatever correlation IDs the consumed chunks already provided — the exact case Greptile flagged.
  2. Parameterised the openai-utils tests. Converted both describe blocks to it.each tables so each input/output pair is its own row.

@greptile review

@greptile-apps

greptile-apps Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Reviews (6): Last reviewed commit: "fix(ai): address Greptile feedback on th..." | Re-trigger Greptile

@johnsykim johnsykim marked this pull request as ready for review May 19, 2026 23:18
@greptile-apps

greptile-apps Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/ai/tests/openai-utils.test.ts:11-14
`extractRequestId` returns the raw value of `_request_id` via `?? undefined`, which converts `null` but **not** an empty string. So `extractRequestId({ _request_id: '' })` returns `''`. `buildProviderMetadata` then silently drops it (truthy check), so behavior is still correct — but the interaction isn't exercised anywhere in the test suite. Adding the case documents the contract explicitly and guards against a future change to the truthy check silently breaking it.

```suggestion
    ['returns undefined for numeric input', 42, undefined],
    ['returns empty string when `_request_id` is empty string', { _request_id: '' }, ''],
  ])('%s', (_name, input, expected) => {
    expect(extractRequestId(input)).toBe(expected)
  })
```

Reviews (7): Last reviewed commit: "fix(ai): address Greptile feedback on th..." | Re-trigger Greptile

@github-actions github-actions Bot removed the stale label May 20, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This PR hasn't seen activity in a week! Should it be merged, closed, or further worked on? If you want to keep it open, post a comment or remove the stale label – otherwise this will be closed in another week.

@github-actions github-actions Bot added the stale label May 27, 2026
@johnsykim

Copy link
Copy Markdown
Contributor Author

@carlos-marchal-ph Let me know if the PR is good to go!

@github-actions github-actions Bot removed the stale label May 28, 2026
Generated-By: PostHog Code
Task-Id: 672e907b-e741-4f0e-abf2-76722b296982

@carlos-marchal-ph carlos-marchal-ph left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just pushed a tiny style fix and will go ahead and merge it!

@carlos-marchal-ph carlos-marchal-ph requested review from richardsolomou and removed request for richardsolomou May 28, 2026 16:13
@carlos-marchal-ph carlos-marchal-ph merged commit 1a2f8a8 into PostHog:main May 29, 2026
44 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants